Electronic Nose Drift Suppression Based on Smooth Conditional Domain Adversarial Networks

Anti-drift is a new and serious challenge in the field related to gas sensors. Gas sensor drift causes the probability distribution of the measured data to be inconsistent with the probability distribution of the calibrated data, which leads to the failure of the original classification algorithm. In order to make the probability distributions of the drifted data and the regular data consistent, we introduce the Conditional Adversarial Domain Adaptation Network (CDAN)+ Sharpness Aware Minimization (SAM) optimizer—a state-of-the-art deep transfer learning method.The core approach involves the construction of feature extractors and domain discriminators designed to extract shared features from both drift and clean data. These extracted features are subsequently input into a classifier, thereby amplifying the overall model’s generalization capabilities. The method boasts three key advantages: (1) Implementation of semi-supervised learning, thereby negating the necessity for labels on drift data. (2) Unlike conventional deep transfer learning methods such as the Domain-adversarial Neural Network (DANN) and Wasserstein Domain-adversarial Neural Network (WDANN), it accommodates inter-class correlations. (3) It exhibits enhanced ease of training and convergence compared to traditional deep transfer learning networks. Through rigorous experimentation on two publicly available datasets, we substantiate the efficiency and effectiveness of our proposed anti-drift methodology when juxtaposed with state-of-the-art techniques.


Introduction
Gas sensors play a vital role across diverse industrial sectors, including environmental surveillance [1][2][3], medical diagnostics [4][5][6], food analytics [7,8], and explosive detection [9,10].Over the past two decades, significant strides have been made in gas sensor technology to meet the practical demands of various applications.For instance, Fort and colleagues proposed three measurement methodologies to effectively differentiate gas mixtures [11], enabling a more precise categorization of wines.This empowers industries to ensure the quality and authenticity of their products.Bhattacharyya et al. introduced a computational framework integrating a cost-effective interface and a wide-range, low-value resistive sensor [12,13].This architecture can assess the quality of unidentified tea samples, providing an economical and efficient solution for the tea industry.In another notable development, Brezmes et al. designed a sensor system specifically for measuring fruit ripeness, tailored to application-specific requirements [14].This system enables a precise and timely evaluation of fruit maturity, assisting in the optimization of harvesting and storage operations.In summary, advancements in gas sensor technology have significantly improved the capability to detect and analyze gases across various industries.These innovations have led to more accurate and reliable outcomes, ultimately enhancing productivity and safety in these sectors.However, since the measurement strategy of gas sensors is to detect the change in resistance and voltage of the gas-sensitive material when it is exposed to the gas to be measured, the sensor sensitivity can be affected by various aspects such as temperature, humidity, pressure, self-aging, and poisoning.Changes in sensor sensitivity can lead to fluctuations in sensor response when the electronic nose is exposed to the same gas at different times, called sensor drift [15].This paper focuses on the drift compensation of gas sensors.
In order to tackle this dilemma, researchers have approached it from three different perspectives.The first approach involves developing gas-sensitive materials that exhibit both high performance and high stability.However, this necessitates breakthroughs in multiple disciplines like physics, chemistry, and materials science, and can be quite costly.Another approach involves enhancing the stability of the gas sensor by modifying its operating mode, such as periodically adjusting the heating voltage.Nevertheless, these two strategies mainly address short-term drift phenomena and have limited impact on long-term drift issues.
To combat long-term drift problems, many researchers have focused on modifying the signal-processing algorithms used in gas sensors.These algorithms are typically classified into three groups: data-level, feature-level, and classifier-level drift compensation methods.

1.
Data-level approaches: Feature-level methods: These approaches aim to align source data (clean data) and target data (drift data) in a shared subspace, minimizing distribution divergence between them.L. Zhang proposed Domain Regularized Component Analysis (DRCA), which reduces marginal distribution divergence between clean and drift data within the common subspace [20].An extension of DRCA, Local Discriminant Subspace Projection (LDSP), seeks to identify a common subspace that simultaneously reduces local within-class variance of projected source samples and maximizes local between-class variance [21].Another approach, named Common Subspace-Based Drift (CSBD), minimizes distribution divergence between clean and drift data within a new subspace [22].

3.
Classifier-level techniques: The performance of a classifier significantly impacts the resulting classification [23].Zhang and Zhang introduced two gas drift correction methods based on Extreme Learning Machines, both of which provide low computational complexity [24].In recent years, online drift compensation methods have been introduced to address sensor drift [25][26][27].Expanding on the concept of active learning, the method (referred to as AL-ISSMK) developed by Liu et al. [26] identifies the most valuable samples and retrains the classifier to adapt to evolving sensor drift.
While the adaptive correction methods mentioned above have shown promising results in compensating for drift in gas sensor arrays, there remain three areas that require further enhancement: (1) Low classification accuracy persists, with most methods achieving rates below 90%.(2) Many approaches rely on labeled data from drifted sensors to enhance accuracy, but obtaining these labels is costly as it involves recalibrating the sensors.
(3) Several methods necessitate an excessive number of hyperparameters, limiting their practicality for real-world applications in production and daily life.
To address the previously mentioned challenges, we present the CDAN+SAM model.In this model, CDAN is devised to extract common features from both clean and drifted data.These extracted features are subsequently input into a neural network to train a more generalized and robust classifier.The SAM optimizer plays a crucial role in smoothing the training process, facilitating easier network training and convergence.The fundamental structure of the CDAN+SAM model is illustrated in Figure 1.The remainder of this paper is organized as follows: The second section provides an introduction to the foundational theory of transfer learning, offering insights into the principles underlying CDAN and SAM.In the third section, we conduct a comprehensive analysis of experimental results and perform ablation experiments to further validate our approach.Finally, the fourth section summarizes the key findings and conclusions of this paper.

Transfer Learning
The domain and task represent the foundational concepts in transfer learning.In this context, given a source domain (D S ) paired with a corresponding source task (T S ) and a target domain (D T ) with its associated task (T T ), transfer learning aims to enhance the predictive function f T () for the target by leveraging relevant information from D S and T S , where D S ̸ = D T or T S ̸ = T T [28].
Evidently, the target domain D T (drift data) and the source domain D S (clean data) exhibit differences in their feature distributions due to sensor drift.Consequently, a classifier trained on clean data becomes unreliable when applied to drift data.Despite both domains measuring the same gas, and thus sharing the same category space (Y s = Y t ), inconsistencies arise in the marginal and conditional probability distributions between the two domains.The objective of transfer learning is to train a classifier using clean data to accurately predict the labels of drift data.

Conditional Adversarial Domain Adaptation Network (CADN)
Deep transfer learning has emerged as a prominent research direction within the field of transfer learning.Researchers are increasingly focused on training domain-invariant classifiers in deep networks to enhance the generalization capabilities of transfer learning methods across diverse data distributions.Adversarial learning has been integrated into deep networks to facilitate the learning of disentangled and transferable representations for domain adaptation.In comparison to other deep transfer methods, conditional adversarial domain adaptation considers not only the inherent correlation within the original data but also the relationships between different categories.
This method is conceptualized as a minimax optimization problem involving two competing error terms: (a) Minimizing the error for classifiers generated from source domain data and source domain labels ensures improved classifier performance on the source domain data.(b) Maximizing the error generated by a domain discriminator trained with both source and target data is designed to confuse the discriminator regarding whether the data originates from the source or target domain.
The optimization objective poses an extreme value optimization problem for training the feature extraction model G, aiming to minimize empirical risk on the source domain data and reduce classification errors on the same data.Simultaneously, the trained feature extraction model G is required to maximize the loss incurred by the domain discriminator model.In the training of the discriminator D, it is crucial for D to create confusion, making it challenging to determine whether the samples are from the source domain dataset or the target domain dataset.The entropy of the domain discrimination model serves as a quantitative measure of the sample migration performance. min Additionally, conditional entropy is employed as a metric for migrability, and the entropy of the sample prediction vector is utilized as the migration weight for the input of the domain discriminant model.Conditional adversarial domain adaptation asserts that the migration performance of a sample is reflected in its category confidence, with samples exhibiting higher category confidence (more clearly labeled) demonstrating superior migration performance.The entropy of the domain discrimination result is also incorporated as a weight for the classification loss originating from the source domain samples.
At this juncture, we have formulated the objective function for transfer weight-based conditional adversarial domain adaptation, which shares a similar structure with the generative adversarial model.Notably, there are two distinctive features: (1) The predicted category vectors are initially applied to enhance the performance of the domain discriminative model.(2) The predicted category vector serves as a metric for sample mobility at the input of the domain discrimination model.
Among various factors, λ represents the trade-off hyperparameter balancing the source domain classification loss and domain discrimination loss.The joint variable h = (c, f ) integrates the feature vector f and the category prediction vector c for a specific domain, commonly achieved through a multilinear operation denoted as h = f ⊗ c.The structural disparity between the conditional adversarial domain adaptation network and the traditional domain adversarial network is illustrated in Figure 2. In the traditional domain adversarial network, the feature is directly fed into the domain discriminator, whereas the conditional adversarial network inputs a cross product of the prediction vector and the feature vector into the domain discriminator.The entropy of the prediction vector (depicted by the dashed line) is also utilized as a weight for adversarial loss, emphasizing the portions more likely to undergo migration.

Smoothness in Domain Adversarial Training
Recently, numerous studies have explored the implications of integrating formulations that enhance smoothness into the domain adversarial training framework.This methodology incorporates a dual objective, comprising the primary task's loss (such as classification or regression) and adversarial components.Researchers have observed that striving for convergence towards a smooth minimum with respect to the task loss stabilizes the adversarial training process, leading to enhanced performance in the target domain.Conversely, their analysis suggests that pursuing convergence towards smooth minima in adversarial loss may result in suboptimal generalization in the target domain.
Building on these insights, we introduce the Sharpness Aware Minimization (SAM) optimizer, a methodology designed to effectively boost the performance of domain adversarial methods in the context of electronic nose system compensation tasks.The fundamental idea behind SAM is to identify a smoother minimum (i.e., low loss in the ϵ neighborhood of θ) by utilizing the following formally defined objective: Here, L obj represents the objective function to be minimized, and ρ ≥ 0 is a hyperparameter that sets the maximum norm for ϵ.Given the inherent difficulty in obtaining the exact solution for the inner maximization, SAM maximizes the first-order approximation instead: The term ε(θ) is incorporated into the weights θ.The gradient update for θ is subsequently computed as ∇ θ L obj (θ) . The outlined procedure can be regarded as a universal smoothness-enhancing formulation applicable to any L obj .Now, we similarly introduce the concept of sharpness-aware source risk to identify a smooth minimum: We articulate the optimization objective of the proposed Smooth Domain Adversarial Training as follows: The first term represents the sharpness-aware risk, while the second term corresponds to the discrepancy term, which, notably, lacks smoothness in our approach.The flowchart of the CDAN+SAM implementation is shown in Figure 3.

Result and Discussion
To assess the efficacy of CDAN+SAM, we conducted a comparative analysis with various deep transfer learning methods using two publicly available sensor drift datasets as benchmarks.Resnet served as the feature extraction network in this model.The experimental configurations are delineated in the subsequent subsections.The computational environment utilized Pycharm, and the hardware specifications are as follows: Windows 10 operating system, Intel Core i7-10300H CPU @ 3.40 GHz, 32.0 GB RAM, GTX 3080 GPU, and a 2 TB SSD.

Experiment on Sensor Drift Dataset A
Dataset A used in Experiment 1 is from UCSD [23], and the dataset measures 6 types of gases, using 16 gas sensors (TGS2600, TGS2602, TGS2610, and TGS2620; 4 of each sensor).The dataset has 8 dimensional features per sample, including 2 rising edge features, 3 falling edge features, and 3 smooth states, and contains a total of 13,910 samples divided into 10 batches.The data were recorded from January 2008 to the end of February 2011, spanning 3 years, where Table 1 shows the details of the dataset and the scatter plot in Figure 4 shows the principal component analysis(PCA) of the dataset.We take Batch 1 as the source domain for model training and test on Batch K, K = 2, . . ., 10 (target domains).The classification accuracy on Batch K is reported.In order to verify the effectiveness of the algorithms, 14 methods of 3 types, namely, drift compensation methods, traditional transfer learning methods, and deep transfer learning methods, are selected for comparison in this paper, of which SVM-rbf, OSC, CC-PCA, GLSW [29], DS [30], and DRCA belong to the drift compensation methods, and these types of methods are capable of identifying and calibrating drift components, and geodesic flow kernel (GFK) [31], TCA [32] and JDA [33] belong to the traditional migration learning methods, which can change the probability distribution of the data in order to improve the recognition algorithm accuracy.Deep Transfer Learning Methods: Within this category are DANN [34], WDANN [35], and MADA [36].These methods represent mainstream approaches for deep domain adaptation.Experiments were conducted on sensor drift Dataset A, and the recognition results for different methods under the experimental setting are presented in Table 2 and Figure 5.It is observed that the proposed CDAN+SAM achieves the best classification performance.The average classification accuracy is 90.32%, which is 7.27% higher than the second-best learning method.Furthermore, for each batch, the best parameters for which the proposed method achieves the highest accuracy are provided in Table 3.The feature extraction network is the Resnet18 network.Since the features of Dataset A are 128 dimensional, a deeper network is needed to extract the features.

Experiment on Sensor Drift Dataset B
The drift displacement electronic nose dataset was collected by Zhang Lei et al. from Chongqing University [20].The dataset was collected using an array of electronic nose sensors of the same model.Experimental measurements included ammonia, benzene, carbon monoxide, formaldehyde, nitrogen dioxide, and toluene.And four TGS series (TGS2602, TGS2620, TGS2201A, and TGS2201B) air sensors were used as well as temperature and humidity sensors (STD2230-I2 Cof Sensirion in Switzerland).The dataset has 6-dimensional features for each sample, and contains a total of 1604 samples, divided into 3 batches: master data, Slave data 1, and Slave data 2, where the master data was collected 5 years prior to Slave 1 data and Slave 2 data.Table 4 records the detailed data of this dataset.The scatter plot in Figure 6 shows the principal component analysis(PCA) of the dataset.Notably, the distributions of the slave systems differ significantly from those of the master system.We used the master data as the source domain of the model and the Slave 1 and Slave 2 datasets as the target domain of the model.The proposed CDAN+SAM is compared with 11 popular transfer learning methods, and the classification results are presented in Table 5 and Figure 7.It is evident that CDAN+SAM consistently demonstrates optimal identification accuracy.Specifically, when compared with WDAAN, which exhibits similarity to the proposed method, CDAN+SAM improves the average recognition rates by 6.21% and 13.82% for Tasks 1 and 2, respectively.Furthermore, for each batch, the best parameters leading to the highest accuracy for the proposed method are detailed in Table 6.The feature extraction network is a CNN network.Since the features of this dataset are 6-dimensional, no deeper network is needed to extract the features.

The Sensitivity of CDAN+SAM to Different Magnitudes of Drift
CDAN+SAM achieves more than 85% accuracy for the first 7 batches of data in Dataset A and for 3 years, which indicates that the method can compensate the accuracy of shortterm drift well.For the last 3 batches of data and for more than 2 years, except for Dataset 9, the accuracy of the compensation is mostly lower than 80% due to the serious drift of the dataset, but it is still higher than that of the other 12 methods.This indicates that CDAN+SAM can handle both short-term and longer-term drifts well.
Compared with the Dataset A, Dataset B has a larger time span and deeper drift, so the average compensation accuracies obtained by all the methods in Dataset B are lower than those obtained by the methods in Dataset A. However, CDAN+SAM achieves the best results in both slaves, which shows that the method can deal with more complex and deeper drift scenarios.

Ablation Study
To comprehensively analyze the role of the SAM component in CDAN+SAM, we conducted ablation experiments under two settings on both Dataset A and Dataset B utilizing CDAN+SAM.
Setting 1: To demonstrate the importance of CDAN in extracting features common to both source and target data, the term CDAN in CDAN+SAM was replaced with DANN.DANN, in contrast to CDAN, solely considers the distinctions between source and target domain data, overlooking the differences between various categories within the data.
Setting 2: To illustrate that the SAM optimizer contributes to smoothing the entire model for improved results, the SAM optimizer in CDAN was replaced with the SGD optimizer.
The results of the ablation experiments for these two settings are summarized in Tables 7 and 8. Ablation study histograms of accuracy under Dataset A and Dataset B are visualized in Figures 8 and 9.The ablation study outcomes highlight that each component plays a crucial role in enhancing the domain adaptation capability of the CDAN+SAM model.The experiments emphasize that, in deep transfer learning, consideration should be given not only to the distinctions between the source and target domain data but also to the differences among various categories within the data.Furthermore, the SAM optimizer proves effective in smoothing the adversarial model, leading to superior results.

Conclusions
This paper presents a novel framework CDAN+SAM for gas sensor drift compensation.Traditional machine learning approaches face challenges in solving the sensor drift problem, which is mainly attributed to the aging of gas-sensitive materials leading to inconsistencies in the probability distributions of calibration and measurement data.In this case, the proposed CDAN+SAM framework excels in capturing the common features of the drifted and raw data, as the model considers not only the relationship between the drifted and clean data, but also the relationship between the data of different species of gases.The SAM optimizer used in CDAN+SAM mitigates the challenges associated with the traditional deep migration learning, such as the training difficulty and the convergence problems.Experimental results demonstrate the superior performance of CDAN+SAM,

Figure 1 .
Figure 1.The basic structure of the CDAN+SAM model.

Figure 2 .
Figure 2. (a) The structure of the traditional domain adversarial loss.(b) The structure of the conditional adversarial loss.

Figure 4 .
Figure 4. PCA scatter diagram of Dataset A.

Figure 5 .
Figure 5. Histogram of the recognition effects of some of the algorithms in Dataset A.

Figure 7 .
Figure 7. Histogram of the recognition effects of some of the algorithms in Dataset B.

Figure 8 .
Figure 8. Histogram of accuracy in ablation study Dataset A.

Figure 9 .
Figure 9. Histogram of accuracy in ablation study Dataset B.

Table 1 .
Benchmark sensor drift dataset from UCSD.

Table 4 .
Data description of the complex E-nose data.

Table 6 .
Parameters' values of the CDAN+SAM under Dataset B.